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Abstract — In  this  paper,  we  are  propose  a new  approach 
for  combining  item-based  collaborative  filtering  with  case 
based  reasoning  to  pursue  personalized  information 
filtering  in  a knowledge  sharing  context.  Functionally, 
our  personalized  information  filtering  approach  allows 
the  use  of  recommendations  by  peers  with  similar 
interests  and  domain  experts  to  guide  the  selection  of 
information  deemed  relevant  to  an  active  user's  profile. 
We  are  developing  Collaborative  Environments:  Fine- 
Grained  Knowledge  Sharing  a flexible  collaboration 
platform  that  enables  secure  and  focused  information 
sharing  across  organizations.  Collaborative 
Environments  uses  two  key  technologies  developed  at  ISI 
to  support  a new  concept  of  fine-grained  semantically 
controlled  information  visibility.  The  Hands 
infrastructure  provides  a semantic  network-based  data 
model,  search  and  filtering  capabilities,  distributed 
systems  support  and  fine-grained  control  of  resource 
visibility. 

Keywords — Advisor  Search,  Text  Mining,  Dirichlet 
Processes,  Graphical  Models. 

I.  INTRODUCTION 

Many  computer  security  problems  can  be  essentially 
reduced  to  separating  malicious  from  non-malicious 
activities.  This  is,  for  example,  the  case  of  spam  filtering, 
intrusion  detection,  or  the  identification  of  fraudulent 
behaviour.  But,  in  general,  defining  in  a precise  and 
computationally  useful  way  what  is  harmless  or  what  is 
offensive  is  often  too  complex.  To  overcome  these 
difficulties,  most  solutions  to  such  problems  have 
traditionally  adopted  a machine -learning  approach, 
notably  through  the  use  of  classifiers  to  automatically 
derive  models  of  (good  and/or  bad)  behavior  that  are  later 
used  to  recognize  the  occurrence  of  potentially  dangerous 
events. 
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Fig.  1:  Example  for  Knowledge  Sharing  In  A 
Collaborative  Environment [1  ] 


An  illustrative  toy  example  is  given  in  Figure  1.  One  can 
use  “tcpdump”  to  intercept  a sequence  of  Web  surfing 
activities  (IP  packets)  for  each  member.  The  scene  is, 
Alice  starts  to  surf  the  Web  and  wants  to  learn  how  to 
develop  a Java  multithreading  program,  which  has 
already  been  studied  by  Bob  (red  rectangle).  In  this  case, 
it  might  be  a good  idea  to  consult  Bob,  rather  than 
studying  by  herself.  We  aim  to  provide  such 

recommendations  by  analyzing  surfing  activities 

automatically.  In  this  example,  not  necessarily  Bob  is  an 
expert  in  every  aspect  of  Java  programming;  however, 
due  to  his  significant  surfing  activities  in  Java 
multithreading,  it  is  reasonable  to  assume  that  he  has 
gained  enough  knowledge  in  this  area  so  that  he  can  help 
Alice  (in  practice  we  could  set  a threshold  on  the  amount 
of  related  surfing  data  to  test  significance).  Even  if  Bob  is 
still  learning,  he  could  share  his  experiences  in  learning 
and  possibly  suggest  good  learning  materials  to  Alice, 
thus  saving  Alice’s  effort  and  time. 

Strictly  speaking,  KIDS’  idea  of  “learning  with  a secret” 
is  not  entirely  new:  Wang  et  al.  introduced  in  Anagram, 
another  payload-based  anomaly  detection  system  that 
addresses  the  evasion  problem  in  quite  a similar  manner. 
We  distinguish  here  between  two  broad  classes  of 
classifiers  that  use  a key.  In  the  first  group,  that  we  term 
randomized  classifiers,  the  classifier  is  entirely  public  (or, 
equivalently,  is  trained  with  public  information  only). 
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However,  in  detection  mode  some  parameters  (the  key) 
are  randomly  chosen  every  time  an  instance  has  to  be 
classified,  thus  making  uncertain  for  the  attacker  how  the 
instance  will  be  processed.  Note  that,  in  this  case,  the 
same  instance  will  be  processed  differently  every  time  if 
the  key  is  randomly  chosen.  We  emphasize  that 
randomization  can  also  be  applied  at  training  time, 
although  it  may  only  be  sufficiently  effective  when  used 
during  testing,  at  least  as  far  as  evasion  attacks  are 
concerned.  KIDS  belong  to  a second  group,  that  we  call 
keyed  classifiers.  The  practicality  of  various  types  of 
cryptanalytic  attacks  depends  on  many  factors:  Attacks 
based  on  few  ciphertext  are  better  than  attacks  that  require 
many  ciphertext,  known  plaintext  attacks  are  better  than 
chosen  plaintext  attacks,  no  adaptive  attacks  are  better 
than  adaptive  attacks,  single  key  attacks  are  better  than 
related  key  attacks,  etc.  Since  it  is  difficult  to  quantify  the 
relative  importance  of  all  these  factors  in  different 
scenarios,  we  usually  concentrate  on  the  total  running 
time  of  the  attack,  which  is  a single  well  defined  number. 

H.  EXISTING  SYSTEM 

The  major  problem  of  computing  optimal  strategies  to 
modify  an  attack  so  that  it  evades  detection  by  a Bayes 
classifier.  The  problem  can  be  formulated  in  game 
theoretic  terms,  where  each  modification  made  to  an 
instance  comes  at  a price,  and  successful  detection  and 
evasion  have  measurable  utilities  to  the  classifier  and  the 
adversary,  respectively.  The  authors  study  how  to  detect 
such  optimally  modified  instances  by  adapting  the 
decision  surface  of  the  classifier,  and  also  discuss  how 
the  adversary  might  react  to  this.  The  setting  used  in 
assumes  an  adversary  with  full  knowledge  of  the 
classifier  to  be  evaded.  Shortly  after,  how  evasion  can  be 
done  when  such  information  is  unavailable.  They 
formulate  the  adversarial  classifier  reverse  engineering 
problem  (ACRE)  as  the  task  of  learning  sufficient 
information  about  a classifier  to  construct  attacks, 
instead  of  looking  for  optimal  strategies.  The  authors  use 
a membership  oracle  as  implicit  adversarial  model:  the 
attacker  is  given  the  opportunity  to  query  the  classifier 
with  any  chosen  instance  to  determine  whether  it  is 
labelled  as  malicious  or  not.  Consequently,  a reasonable 
objective  is  to  find  in-stances  that  evade  detection  with 
an  affordable  number  of  queries.  ACRE  learnable  if 
there  exists  an  algorithm  that  finds  a minimal-cost  in- 
stance evading  detection  using  only  polynomial  many 
queries.  Similarly,  a classifier  is  ACRE  k-learnable  if  the 
cost  is  not  minimal  but  bounded  by  k.  Among  the  results 
given,  it  is  proved  that  linear  classifiers  with  continuous 
features  are  ACRE  k-learnable  under  linear  cost 
functions.  Therefore,  these  classifiers  should  not  be  used 


in  adversarial  environments.  Subsequent  work  by 
generalizes  these  results  to  convex-inducing  classifiers, 
showing  that  it  is  generally  not  necessary  to  reverse 
engineer  the  decision  boundary  to  construct  undetected 
instances  of  near-minimal  cost.  For  the  some  open 
problems  and  challenges  related  to  the  classifier  evasion 
problem.  Additional  works  have  revisited  the  role  of  ma- 
chine learning  in  security  applications,  with  particular 
emphasis  on  anomaly  detection. 

III.  SURVEY  REVIEW 

1.  Matthew  I.  Beal  show  that  it  is  possible  to  extend 
hidden  Markov  models  to  have  a count  ably  infinite 
number  of  hidden  states.  By  using  the  theory  of 
Dirichlet  processes  we  can  implicitly  integrate  out  the 
infinitely  many  transition  parameters,  leaving  only 
three  hyper  parameters  which  can  be  learned  from 
data.  These  three  hyper  parameters  define  a 
hierarchical  Dirichlet  process  capable  of  capturing  a 
rich  set  of  transition  dynamics.  The  three  hyper 
parameters  control  the  time  scale  of  the  dynamics,  the 
sparsity  of  the  underlying  state-transition  matrix,  and 
the  expected  number  of  distinct  hidden  states  in  a 
finite  sequence. 

2.  Krisztian  Balog  present  two  general  strategies  to 
expert  searching  given  a document  collection  which 
are  formalized  using  generative  probabilistic  models. 
The  first  of  these  directly  models  an  expert’s 
knowledge  based  on  the  documents  that  they  are 
associated  with,  whilst  the  second  locates  documents 
on  topic,  and  then  finds  the  associated  expert.  Forming 
reliable  associations  is  crucial  to  the  performance  of 
expert  finding  systems.  Consequently,  in  our 
evaluation  we  compare  the  different  approaches, 
exploring  a variety  of  associations  along  with  other 
operational  parameters. 

3.  Hongbo  Deng  present  three  models  for  expert  finding 
based  on  the  large-scale  DBLP  bibliography  and 
Google  Scholar  for  data  supplementation.  The  first,  a 
novel  weighted  language  model,  models  an  expert 
candidate  based  on  the  relevance  and  importance  of 
associated  documents  by  introducing  a document  prior 
probability,  and  achieves  much  better  results  than  the 
basic  language  model.  The  second,  a topic-based 
model,  represents  each  candidate  as  a weighted  sum  of 
multiple  topics,  whilst  the  third,  a hybrid  model, 
combines  the  language  model  and  the  topic-based 
model.  We  evaluate  our  system  using  a benchmark 
dataset  based  on  human  relevance  judgments  of  how 
well  the  expertise  of  proposed  experts  matches  a query 
topic. 
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4.  Anil  K.  Jain  provide  a brief  overview  of  clustering, 
summarize  well  known  clustering  methods,  discuss 
the  major  challenges  and  key  issues  in  designing 
clustering  algorithms,  and  point  out  some  of  the 
emerging  and  useful  research  directions,  including 
semi-supervised  clustering,  ensemble  clustering, 
simultaneous  feature  selection  during  data  clustering, 
and  large  scale  data  clustering.. 

5.  Jie  Bao  motivates  the  need  for  collaborative 
environments  for  ontology  construction,  sharing,  and 
usage;  identifies  the  desiderata  of  such  environments; 
and  proposes  package  based  description  logics  (P-DL) 
that  extend  classic  description  logic  (DL)  based 
ontology  languages  to  support  modularity  and 
(selective)  knowledge  hiding.  In  P-DL,  each  ontology 
consists  of  packages  (or  modules)  with  well-defined 
interfaces.  Each  package  encapsulates  a closely  related 
set  of  terms  and  relations  between  terms. 

IV.  KNOWLEDGE  BASES,  HAVE  SEVERAL 
IMPORTANT  CHARACTERISTICS 

1.  Constructing  large  ontologies  typically  requires 
collaboration  among  multiple  individuals  or  groups  with 
expertise  in  specific  areas,  with  each  participant 
contributing  only  a part  of  the  ontology.  Therefore, 
instead  of  a single,  centralized  ontology,  in  most  domains, 
there  are  multiple  distributed  ontologies  covering  parts  of 
the  domain. 

2.  Because  no  single  ontology  can  meet  the  needs  of  all 

users  under  every  conceivable  scenario,  the  ontology  that 
meets  the  needs  of  a user  or  a group  of  users’  needs  to  be 
assembled  from  several  independently  developed 

ontology  modules.  Since  different  ontologies  or  different 
modules  of  a single  ontology  are  developed  by  people 
with  diverse  points  of  view,  semantic  inconsistencies  or 
conflicts  between  such  modules  are  inevitable. 

Consequently,  in  collaborative  ontology  environments, 
there  is  a need  for  mechanisms  for  resolving  or  managing 
such  semantic  conflicts  to  ensure  that  the  resulting 
ontology  is  not  internally  inconsistent. 

3.  While  ontologies  are  often  used  to  facilitate  sharing  of 
knowledge,  data,  and  resources,  many  real-world 
scenarios  also  call  for  selectively  hiding  certain  parts  of 
an  ontology  (or  conversely,  selectively  sharing  certain 
parts  of  an  ontology).  The  need  for  knowledge  hiding 
may  arise  due  to  privacy  and  security  concerns,  or  for 
managing  and  knowledge  engineering  purposes. 

V.  KNOWLEDGE  HIDING  IN  ONTOLOGY 

In  this  Fikes  et  al.  mentioned  integration  of  modular 
ontology’s  in  the  Ontolingua  system  and  restricting 
symbol  access  to  public  or  private.  The  major  difference 
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between  our  approach  and  their  approach  is  that  we  use 
packages  not  only  as  modular  ontology  units,  but  also  in 
organizational  hierarchies,  therefore  enabling  the 
hierarchical  management  of  modules  in  collaborative 
ontology  building.  The  scope  limitation  modifier  idea  is 
an  extension  of  the  idea  of  symbol  access  restriction,  but 
it  is  more  flexible  and  expressive. 

Efforts  aimed  at  developing  formal  languages  to  control 
ontology  access  scope  include  Extensible  Access  Control 
Markup  Language  (XACML)  and  policy  languages. 
Giereth  studied  hiding  part  of  RDF,  where  sensitive  data 
in  an  RDF-graph  is  encrypted  for  a set  of  recipients,  while 
all  non-sensitive  data  remain  publicly  readable.  However, 
those  efforts  are  aimed  at  safe  access  on  language  or 
syntactic  level.  On  the  other  hand,  SLM  in  P-DL  aims  at 
knowledge  hiding  on  semantic  level,  where  the  hiding  is 
not  total,  but  partial,  i.e.,  hiding  semantics  can  still  be 
used  in  safe  indirect  inferences.  Farkas  studied  unwanted 
inferences  problem  in  semantic  web  data  on  XML,  RDF 
or  OWL  level.  Our  approach  to  SLM  and  concealable 
reasoning  is  a more  principled  formalism  to  avoid 
unwanted  inferences  and  with  better  defined  localized 
semantics. 

YI.  EXPERT  SEARCH 

Expert  search  aims  at  retrieving  people  who  have 
expertise  on  the  given  query  topic.  Early  approaches 
involve  building  a knowledge  base  which  contains  the 
descriptions  of  people’s  skills  within  an  organization. 
Expert  search  became  a hot  research  area  since  the  start  of 
the  TREC  enterprise  track  in  2005.  Balog  et  al.  proposed 
a language  model  framework  for  expert  search.  Their 
Model  2 is  a document  centric  approach  which  first 
computes  the  relevance  of  documents  to  a query  and  then 
accumulates  for  each  candidate  the  relevance  scores  of 
the  documents  that  are  associated  with  the  candidate.  This 
process  was  formulated  in  a generative  probabilistic 
model.  Balog  et  al.  showed  that  Model  2 performed  better 
and  it  became  one  of  the  most  prominent  methods  for 
expert  search.  Other  methods  have  been  proposed  for 
enterprise  expert  search,  but  the  nature  of  these  methods 
is  still  accumulating  relevance  scores  of  associated 
documents  to  candidates.  Expert  retrieval  in  other 
scenarios  has  also  been  studied,  e.g.  online  question 
answering  communities  academic  society. 

CONCLUSION 

Finally  we  conclude  that  our  study  of  fine-grained 
knowledge  sharing  in  collaborative  environments,  which 
identified  digging  out  fine  grained  knowledge  reflected  by 
people’s  interactions  with  the  outside  world  as  the  key  to 
solving  this  problem.  System  proposes  a two-step 
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framework  to  mine  fine-grained  knowledge  and 
integrated  it  with  the  classic  expert  search  method  for 
finding  right  advisors.  We  demonstrate  the  feasibility  of 
mining  task  micro-aspects  for  solving  this  knowledge 
sharing  problem.  We  leave  these  possible  improvements 
to  future  work.  Finally,  the  classic  expert  search  method 
is  applied  to  the  mined  results  to  find  proper  members  for 
knowledge  sharing.  Experiments  on  Web  surfing  data 
collected  from  our  lab  at  UCSB  and  IBM  show  that  the 
fine  grained  aspect  mining  framework  works  as  expected 
and  outperforms  baselines. 
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